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Establ4shing Standards for Licensing, and 


Certification Tests: Theory vs. Practice 


* ABSTRACT 
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Ft . 
While there 4s a growing body of literature addressing methods and . 


procedures for establishing minimum passing scores, there is little 
sweermation available describing procedures for feavanining cutscores 
for’ revised tests. This paper discusses available standard setting 
procedures as applied to licensing examinations and describes the 
methods and procedures used in reviewing and revising cutscores for a 
teacher licensing test, The procedure described 4s based on a 


modification of the Angoff (1971) procedure applied to tests undergoing 
ie 


revision. 
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There 1s: ‘a Kenetdavante body of literature’ discussing approaches : 


to setting standards . on certification and ‘Veensing tests. “The. 


available literature provides useful information to ‘test developers’: : 


However, most of this literature assumes that, cut scores are set orice 


and remain unchanged for the life of ne testing program.. This paper. 


discusses available standard setting procedures, ‘thei applicabitity to 
“establishing minimum passing scores, for: _ licensing and’ certification 
examinations, and describes procedures wad in reviewing and revising * 
"tut scores for a teacher licensing test. . . 
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Standard Setting ‘Procedures: 


One of. the central concerns in developing a test for licensing 
purposes 4s establishing a cutscore or minimum passing score for the 
examination. Because of the “Tegal and. poritical environment 
surrounding decisions to issue a license based on an ile 
cutscore, this. represents one of the most significant points” tn the: 


test development. DRAEESS A number’ of standard settings are available 


.to the practitioner, taeluding Nédelsky (1954), Angoff (1971), Ebel 
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(1972)... Jaeger (2978), ‘contrasting Groups and’ Borderline Groups (Zleky 
&. Livingston, 1977). - However., sony, a few of these methog's have | 
actually “been used ‘tn setting _cutscores for teacher certification 


oe testing. 


saga autdelines (EEOC culdeT ines, 1978) fretuire that cutscores’,’ 


established for personne! tests must bear. an. empirical and \ogten) 


“relationship to “the _job.: “However , “unit relatively recently most ‘tests 


for. Vcenstng teachers did not establish. cutscores that. were either’. 


enpiricaty derived or that | systematically bore a. relattonship to 


successful performance on the: sob. for example, ina ‘number of states” 
the use of the NTE with an arbitrartly - set cue score wa’s Vegally - 
cha Tenged. ‘Cases challenging the use of the NTE with arbitrartly sat ; 
cutscores include: United: States Vv. North Carolina (1975); Baker v. 
Columbus Muntctpat _ "Separate School District (1976); and Georaia 
Assoctation of Educators v. Jack P, Nix (1976). Where a cut-off ‘score 


is used © to determine ” those candidates that are qualified or 
unqualified, the user of the test must ‘give. suffictent proof that the 
cut-off was not: established in aggapricious or arbitrary manner. 

“In South Carolina in 1977, 4t was found ‘that’ the usé of the NTE . 
resulted in adverse ne against blacks. However, the state decided 
to investigate. the test, validate: it in South carolina, and “at 
cutscores in a systematic, omptrien fashion. The result was that some . 


of the NTE tests. were validated. and tharoved for use in, South Carolina. 
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' _ Under tying most WAM used are the ‘procedures dvetaned by 
Nedelsky : *(1954) and Angof (1971). eee procedures have been 
modified, consolidated, lengthened and abbreviated for use in severa] 


states. They are dascribed and discussed below. 


( 


a) a Nedelsky (1954). The Nedelsky model 1s one of the earliest 


"formal" cut score procedures. Nedelsky's (1954) approach requires a 
y : 


panel of judges to review each test item and determine which of the 


2 “available response options the “Towest nM student" should be able to 


_ reject as incorrect. Judges then pekopa: thé reciprocal of the number 
\ 


of the cena responses adjacent to the item. | 
% . <The evtscore Hy then determined by. totalling the reciprocals 
“iebeatnad, The data across. judges -is used as basis for determining s 
. ad the final cut score. ,Depending upon -the cere application, the 
! mean or median cut score for the group of judges may be used as the cut 
score. A number of modifications of the Nedelsky (1954) procedure have 
been employed, most notably the SUDEELCEION of “minimally competent 


person" for "lowest O0-student." 


Angoff (1971). In the Angoff (1971) method, expert Judaus review 
each test item’ and indicate the probability that a person with minimum 
competency can give the correct response. The Angoff procedure is easy 
to explain, easy to understand and easy to administer. It 4s less time 
consuming, than Nedelsky's (1954) and can be used on open-ended items. . 


Angoff (1971) describes the methodology as follows: 


. .ask each judge to state the probability that the 
‘minimally acceptable person’ would answer each item 
correctly. In effect, the juddes would think of a number ‘of 

‘ minimally acceptable persons, instead of only one such 
person, and’ would estimate the proportion o nimally 
acceptable persons who would answer each item correctly. The 
sum Of these probabilities, or proportions, would then 
represent: the minimally acceptable score. (Angoff, 1971, 
p.. 515%) 


In some applications, judges are asked to directly estimate the- 


percentage of. individuals they would expect to answer the item 


correctly. ® \ i. 4 


Jaeger (1978). The Jaeger (1978) procedure. maximizes the 


involvement of educational constituencies. In the North Carolina 


application, 700 persons convened in groups of 50 to proceed | 


through the Standard setting model. The procedure is as follows: 


Judges were first required to take the exam that they would 


x 


later rate. For each item judges were asked one of the following .~ 


questions: 


’ correctly? 


& 


(2) If a student does not answer this item, should s/he be denied 


* 


a high school diploma? 


a 


Judges next received the results of the above survey questions as 
well as actual performance data. With this information, judges were 


asked to’ review and revise their initial judgments as they considered 


necessary. 


(1) Should every high school graduate be able to answer this item 
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The procedure then calls for recalculation of the judges' ratings, 


redistribution of the new ratings, and another judgement. Judges then 
received information on the proportion of students who would have | 
passed. or failed, as determined on the basis’ of the recommended cut-off 
scores. 

Median scores were calculated .by group (type -and constituency), 
and the passing score was then set at the minimum median score 
calculated for a group. . | 


This process is technically straightforward and involves iterative 


reviews, and the inclusion of normative student data.. 


Revising Cut Scores 


» 
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Teacher licensing tests, particularly in those fields where 
procedures and content are subject to frequent change, require updating 
on a periodic basis. A number of conditions tn the testing environment 
may require a revision of the examination. These conditions include: 


changes in the job content 

changing definitions of the minimally competent professional 
available data on actual examinee performance ’ 
changing consequences of failing the examination 


changes in the political climate surrounding the testing | 
program (more stringent or lax standards) 


) changes in the legal environment 


0 changes in the number of licensed personnel required in the 
field : 
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Regardless of the rationale for revising a test, the revision will 
require a reconsideration of the cutscore in some fashion. Revising 


the cutscore for a teacher licensing test ensures that Sxamringes tak‘ing 


_ the original form of the test. and those taking the revised. form are 


scored on an equivalent basis. 

Traditionally, adjustments in eabecnies required due to test 
revision a g., item replacement, item revision) have been accomp1ished 
through some forh of equating. While equating ‘ide Hniees that scores 
are adjusted for changes in test form difficulty, equating will. not 
account for qualitative changes in test content. To account fe 
changes in content,- expert judgement must be called upon to reevaluate 


( 
standards for the minimally competent professional. For example, in a 


. teacher licensing test a set of 10 revised items may be substituted for 


an initial set of 10 items; while these items may be of equivalent 
difficulty statistically “{pevafues, logit _values), differences in 
content my require d\ffering levels: of proficiency; the initial set of 
items may contain content that may require 70% mastery to be considered 
minimally competent and the set of replacement items may require 80% 
mastery to establish minimum competency. | 4 

Because revisions in test content, particularly in the case of 
criterion-referenced tests, masuit ‘An changes that: can ngt soley be 
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accounted for through statistical equating, the cutscore needs to be 
* 


‘ 
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reevaluated using seneeh Sudamwne, In selecting an appropriate#method 
for reevaluating the cutscore through expert’ judgment, a number of 
additional factors need to be considered. Most importantly, the 
methedology employed should maintain: a strong "link" to the setatna) 
\ est since much. of the original content is, retained on the revised 

he this can be) accomp11shed in fe primary ways. First, 4f the 
origing| cutycore was determined . panel of experts initially, this 
same panel can be recalled to reevaluate the cutscore to maintain 
eaneieteney: Second, the methodology employed for reevaluating the 
cutscore * should, in _ Some form, rely .on data collected from the 
administration ° of the original test. 


‘ 


In addition -to these 
 gonsiderations, the approach selected for cutscore revision should be 


both practical to administer and easily. interpretable by the panel of. 
experts selected. . , f 
After exploring the issues discussed above, the eukseors procedure 
suggested by Angoff (1971) was selected for the stan ard setting effort 
described below. “The Angoff (1971) approach, as applied bea presents 
a practical, easily interpretable method for revising a test standard. 
Moreover, ie applied in this. situation, the panel | of experts that 
worked on the original test hevelopennt, can ber reassembled and data on 
item performance (item p-values) can be used as a basis for making 


judgments. 
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Cut Score Revision Protedures | ’ 


’ : , . oe e ¢ | 

Overview. The _ tests which underwent devieion were inittanty 
developed in 1980 to assess teacher ee geee content “knowledge - in 
their fields of specialization. All teachers i". the state for which 
the tests were developed are required to pass the content kfowledge 
tests prior to receiving a license to teach. In order to ensure @that 
the tests remain up-to-date, they are reviewed approximately every 
three years by a panel of content experts. In the Fall of 1983, Yo of 
the bs Nl included in the sles were reviewed to update the tests - 
based on (1) changes CesHEEInG in the field and (2) information gained 

; mee the results of the first three years of test administration. 

For auch field reviewed, the original test development committee 
consisting of practicing educators (public schools and higher 
education) was convened. (In some cases additional committee member's 
were added to pepace original copmittee wenber’ ‘anable to participate 


ra 


or to provide added expertise to the committee. ) 


Updating procedures. Updating of the tests \was accomplished by 
having the: committees review the pool of questions on the basis of 
‘tapicality and based on item statistics \ obtained from test 
administration data. The committees identified out-dated information 
or terminology within individual test questions, then revised or 


replaced the questions identified as non-topical. 
" 


- 
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- Before conducting the ren leds committee menbers were sbabtiads a 
. brief, training session. The purpose of the review was explained and 
“the criteria used to judge the topicality of a test .question were 
discussed. The comnittee members were first asked to go through the 
item review booklet, item by item, to determine “4f the content, 
terminology or correct answer for any item was nat” up-to-date. If an 
item was judged to be non-topical the committee members were instructed 2 
to circle the m code number in the item review booklet and briefly 
describe the nature of the nob beac The Seer discussed each of 
the items identified as non-topical and-the necessary corrections were 
recorded into a "Master" item review booklet. Then each individual 
test question vas reviewed in light of the results of the first three 
years of ieee: administration; items requiring revision ie identified 
and revised or replaced. Committee members were referred to the stem 
statistics which provided the results For each question ' that had ‘ ae 
appeared on 5 test form. The revtewers were asked to pay particular 
attention ‘to those questions for which more than 95% or Tess than 30% 
* the students aur ee the question correctly or for which a greater 
number of examinees selected an incorfect response than the correct 
response. These questions were reviewed to determine if revisions were 
required. If the committee felt the question would be improved by 
revising it, the item number was circled in the. item review booklet and 
the reason for lheriag the: item as requiring daeletan was note The: 
committee then discussed each item identified by one of its membe ‘sas 


ne revision and the-necessary corrections were recorded in the 


"Master" item review booklet. 
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winteus passing score review. In order to gain the information 
pecessary, to re- evaluate the a passing scores for the 
examination, each committee ea” was asked to independently Judge the 
difficulty level of wach item ‘based on the procedures ‘Suggested by 
Angof f (1971). 

Each committee member was instructed ~to @vision the minimally 
competent entry-level teacher; the Individual who has acquired the 
basic, knowledge. An the field to meet the minimum basic standards for 
teaching. After reading each item carefully the committee members were 
_ then asked to decide on the percent of minimally cone ent entry- level 
teachers whom they fort should answer the item canreetly: To ere 
the committee members. in making a detision,,they were referred to the 
item statistics which indicated the ow ee examinees who answered 
‘the - item ‘correctly for the first three years of test administration; 
this information was provided only for those items that had been 
Aneluded on a test form. Committee members wibe cautioned that changes 
made to the items during the updating portion of the review may 

ncrease or decrease the difficulty level of the items. They recorded 
_ their decisions on the minimum content knowledge rating form which 
listed each item by its code number. | 

The results were analyzed according to procedures suggested by 
Angoff (1971). The mean (difficulty) rating across judges was computed 
for each item. The sum of. the mean ratings provided the preliminary | 
cutscore for the revised examination. For exaNy 8; one/ exan with five 


e 


items having mean ratings of 80%, 15%, 85%, 80%, and 70% would have 
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‘a preliminary passing score of 390 or 78%. 
Example: .80 
* . Pe 


85 
80 


3.90 
The results of the first administration were examined and the 
preliminary standards were lowered by 3 Standard Errors of Measurement 
(SEM) “to “winimize false negative errors and to hie nae 


sistency 


with the procedures used in setting the original cutscores. 


The procedure described provides a practical, legally defensible, 
and technically sound approach to reevaluating cutscores for\ revised 
licensing tests. Other "Fleld-accepted" standard setting methods may 
be equally adaptable to establishing standards for revised licensing 
tests, and research examining other approaches 14s currently underway. 
Regardless of which approach js employed, when criterion-referenced 
licensing tests are revised, the cutscores’ should be reassessed by 
expert judges in a manner that maintains a close link with the original 


y 


test. 
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